ai text detector
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text
Cheng, Yize, Sadasivan, Vinu Sankar, Saberi, Mehrdad, Saha, Shoumik, Feizi, Soheil
The increasing capabilities of Large Language Models (LLMs) have raised concerns about their misuse in AI-generated plagiarism and social engineering. While various AI-generated text detectors have been proposed to mitigate these risks, many remain vulnerable to simple evasion techniques such as paraphrasing. However, recent detectors have shown greater robustness against such basic attacks. In this work, we introduce Adversarial Paraphrasing, a training-free attack framework that universally humanizes any AI-generated text to evade detection more effectively. Our approach leverages an off-the-shelf instruction-following LLM to paraphrase AI-generated content under the guidance of an AI text detector, producing adversarial examples that are specifically optimized to bypass detection. Extensive experiments show that our attack is both broadly effective and highly transferable across several detection systems. For instance, compared to simple paraphrasing attack--which, ironically, increases the true positive at 1% false positive (T@1%F) by 8.57% on RADAR and 15.03% on Fast-DetectGPT--adversarial paraphrasing, guided by OpenAI-RoBERTa-Large, reduces T@1%F by 64.49% on RADAR and a striking 98.96% on Fast-DetectGPT. Across a diverse set of detectors--including neural network-based, watermark-based, and zero-shot approaches--our attack achieves an average T@1%F reduction of 87.88% under the guidance of OpenAI-RoBERTa-Large. We also analyze the tradeoff between text quality and attack success to find that our method can significantly reduce detection rates, with mostly a slight degradation in text quality. Our adversarial setup highlights the need for more robust and resilient detection strategies in the light of increasingly sophisticated evasion techniques.
- Asia > China (0.28)
- Asia > Vietnam (0.04)
- Europe > United Kingdom > England (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Government > Regional Government (1.00)
- (9 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)
AuthorMist: Evading AI Text Detectors with Reinforcement Learning
In the age of powerful AI-generated text, automatic detectors have emerged to identify machine-written content. This poses a threat to author privacy and freedom, as text authored with AI assistance may be unfairly flagged. We propose AuthorMist, a novel reinforcement learning-based system to transform AI-generated text into human-like writing. AuthorMist leverages a 3-billion-parameter language model as a backbone, fine-tuned with Group Relative Policy Optimization (GPRO) to paraphrase text in a way that evades AI detectors. Our framework establishes a generic approach where external detector APIs (GPTZero, WinstonAI, Originality.ai, etc.) serve as reward functions within the reinforcement learning loop, enabling the model to systematically learn outputs that these detectors are less likely to classify as AI-generated. This API-as-reward methodology can be applied broadly to optimize text against any detector with an accessible interface. Experiments on multiple datasets and detectors demonstrate that AuthorMist effectively reduces the detectability of AI-generated text while preserving the original meaning. Our evaluation shows attack success rates ranging from 78.6% to 96.2% against individual detectors, significantly outperforming baseline paraphrasing methods. AuthorMist maintains high semantic similarity (above 0.94) with the original text while successfully evading detection. These results highlight limitations in current AI text detection technologies and raise questions about the sustainability of the detection-evasion arms race.
DUPE: Detection Undermining via Prompt Engineering for Deepfake Text
Weichert, James, Dimobi, Chinecherem
As large language models (LLMs) become increasingly commonplace, concern about distinguishing between human and AI text increases as well. The growing power of these models is of particular concern to teachers, who may worry that students will use LLMs to write school assignments. Facing a technology with which they are unfamiliar, teachers may turn to publicly-available AI text detectors. Yet the accuracy of many of these detectors has not been thoroughly verified, posing potential harm to students who are falsely accused of academic dishonesty. In this paper, we evaluate three different AI text detectors--Kirchenbauer et al. watermarks, ZeroGPT, and GPTZero--against human and AI-generated essays. We find that watermarking results in a high false positive rate, and that ZeroGPT has both high false positive and false negative rates. Further, we are able to significantly increase the false negative rate of all detectors by using ChatGPT 3.5 to paraphrase the original AI-generated texts, thereby effectively bypassing the detectors.
- North America > United States > Virginia (0.05)
- Europe > United Kingdom > England (0.05)
- North America > United States > Michigan (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Security & Privacy (0.91)
- Education > Educational Setting > Higher Education (0.32)
GenAI Detection Tools, Adversarial Techniques and Implications for Inclusivity in Higher Education
Perkins, Mike, Roe, Jasper, Vu, Binh H., Postma, Darius, Hickerson, Don, McGaughran, James, Khuat, Huy Q.
This study investigates the efficacy of six major Generative AI (GenAI) text detectors when confronted with machine-generated content that has been modified using techniques designed to evade detection by these tools (n=805). The results demonstrate that the detectors' already low accuracy rates (39.5%) show major reductions in accuracy (17.4%) when faced with manipulated content, with some techniques proving more effective than others in evading detection. The accuracy limitations and the potential for false accusations demonstrate that these tools cannot currently be recommended for determining whether violations of academic integrity have occurred, underscoring the challenges educators face in maintaining inclusive and fair assessment practices. However, they may have a role in supporting student learning and maintaining academic integrity when used in a non-punitive manner. These results underscore the need for a combined approach to addressing the challenges posed by GenAI in academia to promote the responsible and equitable use of these emerging technologies. The study concludes that the current limitations of AI text detectors require a critical approach for any possible implementation in HE and highlight possible alternatives to AI assessment strategies.
- Asia > Vietnam (0.04)
- North America > United States (0.04)
- Europe > Switzerland (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)
Most sites claiming to catch AI-written text fail spectacularly • TechCrunch
As the fervor around generative AI grows, critics have called on the creators of the tech to take steps to mitigate its potentially harmful effects. In particular, text-generating AI in particular has gotten a lot of attention -- and with good reason. Students could use it to plagiarize, content farms could use it to spam and bad actors could use it to spread misinformation. OpenAI bowed to pressure several weeks ago, releasing a classifier tool that attempts to distinguish between human-written and synthetic text. But it's not particularly accurate; OpenAI estimates that it misses 74% of AI-generated text. In the absence of a reliable way to spot text originating from an AI, a cottage industry of detector services has sprung up.
- North America > United States > Delaware > New Castle County > Wilmington (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > Central America (0.04)
- Law (1.00)
- Government > Voting & Elections (0.89)
- Government > Regional Government > North America Government > United States Government (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.76)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.57)